Mining Wikipedia's Snippets Graph - First Step to Build a New Knowledge Base

نویسندگان

  • Andias Wira-Alam
  • Brigitte Mathiak
چکیده

In this paper, we discuss the aspects of mining links and text snippets from Wikipedia as a new knowledge base. Current knowledge base, e.g. DBPedia[1], covers mainly the structured part of Wikipedia, but not the content as a whole. Acting as a complement, we focus on extracting information from the text of the articles. We extract a database of the hyperlinks between Wikipedia articles and populate them with the textual context surrounding each hyperlink. This would be useful for network analysis, e.g. to measure the influence of one topic on another, or for question-answering directly (for stating the relationship between two entities). First, we describe the technical parts related to extracting the data from Wikipedia. Second, we specify how to represent the data extracted as an extended triple through a Web service. Finally, we discuss the usage possibilities upon our expectation and also the challenges.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Linked Open Graph: browsing multiple SPARQL entry points to build your own LOD views

A number of accessible RDF stores are populating the linked open data world. The navigation on data reticular relationships is becoming every day more relevant. Several knowledge base present relevant links to common vocabularies while many others are going to be discovered increasing the reasoning capabilities of our knowledge base applications. In this paper, the Linked Open Graph, LOG, is pr...

متن کامل

How to Semantically Enhance a Data Mining Process?

This paper presents the KEOPS data mining methodology centered on domain knowledge integration. KEOPS is a CRISP-DM compliant methodology which integrates a knowledge base and an ontology. In this paper, we focus first on the pre-processing steps of business understanding and data understanding in order to build an ontology driven information system (ODIS). Then we show how the knowledge base i...

متن کامل

Exploring Wikipedia's Category Graph for Query Classification

Wikipedia’s category graph is a network of 400,000 interconnected category labels, and can be a powerful resource for many classification tasks. However, its size and the lack of order can make it difficult to navigate. In this paper, we present a new algorithm to efficiently explore this graph and discover accurate classification labels. We implement our algorithm as the core of a query classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012